Search CORE

597 research outputs found

Learning associations between clinical information and motion-based descriptors using a large scale MR-derived cardiac motion atlas

Author: D Peressutti
D Rueckert
H Hotelling
H Hotelling
I Oksuz
K Mukamal
M Sinclair
N Jitnarin
P Vincent
S Petersen
S Roweis
W Bai
W Bai
Publication venue
Publication date: 27/07/2018
Field of study

The availability of large scale databases containing imaging and non-imaging data, such as the UK Biobank, represents an opportunity to improve our understanding of healthy and diseased bodily function. Cardiac motion atlases provide a space of reference in which the motion fields of a cohort of subjects can be directly compared. In this work, a cardiac motion atlas is built from cine MR data from the UK Biobank (~ 6000 subjects). Two automated quality control strategies are proposed to reject subjects with insufficient image quality. Based on the atlas, three dimensionality reduction algorithms are evaluated to learn data-driven cardiac motion descriptors, and statistical methods used to study the association between these descriptors and non-imaging data. Results show a positive correlation between the atlas motion descriptors and body fat percentage, basal metabolic rate, hypertension, smoking status and alcohol intake frequency. The proposed method outperforms the ability to identify changes in cardiac function due to these known cardiovascular risk factors compared to ejection fraction, the most commonly used descriptor of cardiac function. In conclusion, this work represents a framework for further investigation of the factors influencing cardiac health.Comment: 2018 International Workshop on Statistical Atlases and Computational Modeling of the Hear

arXiv.org e-Print Archive

Crossref

King's Research Portal

Blind Normalization of Speech From Different Channels

Author: Boll S. F.
Cox R. V.
David N. Levin
Gales M. J. F.
Levin D. N.
Levin D. N.
Levin D. N.
Roweis S. T.
Tenenbaum J. B.
Young S.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 02/04/2002
Field of study

We show how to construct a channel-independent representation of speech that has propagated through a noisy reverberant channel. This is done by blindly rescaling the cepstral time series by a non-linear function, with the form of this scale function being determined by previously encountered cepstra from that channel. The rescaled form of the time series is an invariant property of it in the following sense: it is unaffected if the time series is transformed by any time-independent invertible distortion. Because a linear channel with stationary noise and impulse response transforms cepstra in this way, the new technique can be used to remove the channel dependence of a cepstral time series. In experiments, the method achieved greater channel-independence than cepstral mean normalization, and it was comparable to the combination of cepstral mean normalization and spectral subtraction, despite the fact that no measurements of channel noise or reverberations were required (unlike spectral subtraction).Comment: 25 pages, 7 figure

arXiv.org e-Print Archive

Crossref

On Optimizing Locally Linear Nearest Neighbour Reconstructions Using Prototype Reduction Schemes

Author: B.V. Dasarathy
C.G. Atkeson
C.J.C. Burges
C.L. Chang
G.L. Ritter
I. Tomek
J.C. Bezdek
K. Fukunaga
K. Fukunaga
P. Kang
P.E. Hart
S.-W. Kim
S.-W. Kim
S.T. Roweis
S.T. Roweis
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/2010
Field of study

This paper concerns the use of Prototype Reduction Schemes (PRS) to optimize the computations involved in typical k-Nearest Neighbor (k-NN) rules. These rules have been successfully used for decades in statistical Pattern Recognition (PR) applications, and have numerous applications because of their known error bounds. For a given data point of unknown identity, the k-NN possesses the phenomenon that it combines the information about the samples from a priori target classes (values) of selected neighbors to, for example, predict the target class of the tested sample. Recently, an implementation of the k-NN, named as the Locally Linear Reconstruction (LLR) [11], has been proposed. The salient feature of the latter is that by invoking a quadratic optimization process, it is capable of systematically setting model parameters, such as the number of neighbors (specified by the parameter, k) and the weights. However, the LLR takes more time than other conventional methods when it has to be applied to classification tasks. To overcome this problem, we propose a strategy of using a PRS to efficiently compute the optimization problem. In this paper, we demonstrate, first of all, that by completely discarding the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the quadratic optimization problem can be computed far more expediently. The values of the corresponding indices are comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding classification accuracies are noticeably less. The proposed method has been tested on artificial and real-life data sets, and the results obtained are very promising, and has potential in PR applications

Crossref

Carleton University's Institutional Repository

NORA - Norwegian Open Research Archives

Agder University Research Archive

Enhancing network embedding with implicit clustering

Author: A Zhang
B Yoshua
J-H Li
M Belkin
Q Li
S Wang
ST Roweis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Network embedding aims at learning the low dimensional representation of nodes. These representations can be widely used for network mining tasks, such as link prediction, anomaly detection, and classification. Recently, a great deal of meaningful research work has been carried out on this emerging network analysis paradigm. The real- world network contains different size clusters because of the edges with different relationship types. These clusters also reflect some features of nodes, which can contribute to the optimization of the feature representation of nodes. However, existing network embedding methods do not distinguish these relationship types. In this paper, we propose an unsupervised network representation learning model that can encode edge relationship information. Firstly, an objective function is defined, which can learn the edge vectors by implicit clustering. Then, a biased random walk is designed to generate a series of node sequences, which are put into Skip-Gram to learn the low dimensional node representations. Extensive experiments are conducted on several network datasets. Compared with the state-of-art baselines, the proposed method is able to achieve favorable and stable results in multi-label classification and link prediction tasks

Crossref

OPUS - University of Technology Sydney

University of Tasmania Open Access Repository

Multivariate multi-way analysis of multi-source data

Author: Brites
HOTELLING
I. Huopaniemi
J. Nikkila
Kotronen
Le Cao
Lucas
M. Oresic
Roweis
S. Kaski
Summers
T. Suvitaival
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates

Crossref

PubMed Central

VTT Research System

Applications of Information Theory to Analysis of Neural Data

Author: A Belitski
A Kraskov
C Magri
C Magri
C Magri
C Stosiek
D Ruyter
G Buzsáki
GT Einevoll
HS Seung
J Oñativia
MA Montemurro
R Quian Quiroga
RAA Ince
S Panzeri
SR Schultz
ST Roweis
T-W Chen
W Denk
W Denk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/01/2015
Field of study

Information theory is a practical and theoretical framework developed for the study of communication over noisy channels. Its probabilistic basis and capacity to relate statistical structure to function make it ideally suited for studying information flow in the nervous system. It has a number of useful properties: it is a general measure sensitive to any relationship, not only linear effects; it has meaningful units which in many cases allow direct comparison between different experiments; and it can be used to study how much information can be gained by observing neural responses in single trials, rather than in averages over multiple trials. A variety of information theoretic quantities are commonly used in neuroscience - (see entry "Definitions of Information-Theoretic Quantities"). In this entry we review some applications of information theory in neuroscience to study encoding of information in both single neurons and neuronal populations.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Validation of nonlinear PCA

Author: A Herman
A Ilin
AN Gorban
B Chalmond
B Christiansen
B Efron
B Schölkopf
BW Lu
D DeMers
JB Tenenbaum
LK Saul
M Scholz
MA Kramer
Matthias Scholz
MR Hestenes
ND Lawrence
P Demartines
R Hecht-Nielsen
S Girard
S Harmeling
S Mika
ST Roweis
T Hastie
T Kohonen
WW Hsieh
WW Hsieh
WW Hsieh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Fondazione Edmund Mach

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Author: C Ceruti
GA Tribello
JB Tenenbaum
L Molgedey
M Chen
M Fan
P Grassberger
R Badii
S Piana
ST Roweis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved; in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Sissa Digital Library

Metrics for vector quantization-based parametric speech enhancement and separation

Author: Ellis D. P. W.
Kleijn W. B.
Kuropatwinski M.
Mads Græsbøll Christensen
Radfar M. H.
Roweis S. T.
Vafin R.
van de Par S.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2013
Field of study

Crossref

VBN